Data Source

IPUMS Health Surveys: NHIS is a harmonized set of data covering more than 50 years (1963-present) of the National Health Interview Survey (NHIS). The NHIS is the principal source of information on the health of the U.S. population, covering such topics as general health status, the distribution of acute and chronic illness, functional limitations, access to and use of medical services, insurance coverage, and health behaviors. On average, the survey covers 100,000 persons in 45,000 households each year. The IPUMS NHIS facilitates cross-time comparisons of these invaluable survey data by coding variables identically across time. Our analysis will use data from 2015 to 2021, which covers the COVID-19 period.

Data cleaning

We pulled out data from IPUMS Health Surveys: NHIS and will limit our analysis using data from 2015 to 2021. To analyze the trend of anxiety prevalence, frequency and level from 2015 to 2021, we will focus on anxiety indicators listed below:

  • WORFREQ:How often feel worried, nervous, or anxious
  • WORRX: Take medication for worried, nervous, or anxious feelings
  • WORFEELEVL: Level of worried, nervous, or anxious feelings, last time

To analyze the trend of depression prevalence, frequency and level from 2015 to 2021, we will focus on depression indicators listed below:

  • DEPRX:Take medication for depression
  • DEPFREQ:How often feel depressed
  • DEPFEELEVL: Level of depression, last time depressed

Core demographic and Social economic status indicators listed below are also included in this analysis:

  • AGE:Age
  • SEX:Biological sex
  • MARST:Current marital status
  • POVERTY:Ratio of family income to poverty threshold

Responses indicate Unknown or not applied are excluded from our analysis.

anx_dep = 
  read_csv("data/nhis_data01.csv") %>% 
  janitor::clean_names() %>% 
  filter(year>=2015) %>% 
  select(year, worrx, worfreq, worfeelevl, deprx, depfreq, depfeelevl, age, sex, marst, poverty) %>% 
  mutate(
    sex = recode_factor(sex, 
                        "1" = "Male", 
                        "2" = "Female"),
    marst = recode_factor(marst, 
                        "10" = "Married", "11" = "Married", "12" = "Married", "13" = "Married",
                        "20" = "Widowed",
                        "30" = "Divorced",
                        "40" = "Separated",
                        "50" = "Never married"),
    poverty = recode_factor(poverty, 
                        "11" = "Less than 1.0", "12" = "Less than 1.0", 
                        "13" = "Less than 1.0", "14" = "Less than 1.0",
                        "21" = "1.0-2.0", "22" = "1.0-2.0", 
                        "23" = "1.0-2.0", "24" = "1.0-2.0", 
                        "25" = "1.0-2.0",
                        "31" = "2.0 and above","32" = "2.0 and above",
                        "33" = "2.0 and above","34" = "2.0 and above",
                        "35" = "2.0 and above","36" = "2.0 and above",
                        "37" = "2.0 and above","38" = "2.0 and above"),
    worrx = recode_factor(worrx,
                          '1' = "no", 
                          '2' = "yes"),
    worfreq = recode_factor(worfreq, 
                            '1' = "Daily", 
                            '2' = "Weekly", 
                            '3' = "Monthly", 
                            '4' = "A few times a year", 
                            '5' = "Never"),
    worfeelevl = recode_factor(worfeelevl, 
                               '1' = "A lot", 
                               '3' = "Somewhere between a little and a lot", 
                               '2' = "A little"),
    deprx = recode_factor(deprx, '1' = "no", '2' = "yes"),
    depfreq = recode_factor(depfreq, '1' = "Daily", '2' = "Weekly", 
                            '3' = "Monthly", '4' = "A few times a year", 
                            '5' = "Never"),
    depfeelevl = recode_factor(depfeelevl, '1' = "A lot", 
                               '3' = "Somewhere between a little and a lot", 
                               '2' = "A little")
    ) %>% 
  filter_at(vars(worrx, worfreq, worfeelevl, deprx, depfreq, depfeelevl), any_vars(!is.na(.)))

Mental Illness

Load and clean data

mental_df = 
  read_csv("./data/mental_data.csv") %>% 
  janitor::clean_names() %>% 
  mutate(
    any_mental_num = any_mental_num / 1000000,
    any_mental_per = any_mental_per * 100,
    ser_mental_num = ser_mental_num / 1000000,
    ser_mental_per = ser_mental_per * 100,
    state_abb = state.abb[match(state, state.name)],
    region = state.region[match(state, state.name)]
  ) %>% 
  mutate(
    state_abb = replace(state_abb, state == "District of Columbia", "DC"))

Map: Percent of adults reporting any mental illness by state between 2019-2020

state_mental=
  plot_usmap(
    data = mental_df,
    regions = "state",
    values = "any_mental_per", 
    labels = TRUE, label_color = "white") +
  labs(
    title = "Percent of adults reporting any mental illness for each state, 2019-2022"
  ) +
  scale_fill_continuous(
    name = "Mental illness percent (%)",
    label = scales::comma) +
  theme(legend.position = "right")

ggplotly(state_mental)

According to the mental health data collected between 2019 -2020, the mental illness percents are high in the US overal, with variations between states.

Any/Serious Mental illness numbers (million), by region, 2019-2020

any_mental_plot = 
  mental_df %>% 
    group_by(region) %>%
    drop_na() %>% 
    summarize(any_mental_num = sum(any_mental_num)) %>% 
    ggplot(
      aes(x = region, y = any_mental_num, fill = region)) +
    geom_bar(stat = "identity") +
    labs(
      title = "Any Mental Illness Number, by Region, 2019-2020",
      x = "Region",
      y = "Mental illness number (million)",
      fill = "Region") +
  theme(legend.position = "bottom")

ser_mental_plot =
  mental_df %>% 
    group_by(region) %>%
    drop_na() %>% 
    summarize(ser_mental_num = sum(ser_mental_num)) %>% 
    ggplot(
      aes(x = region, y = ser_mental_num, fill = region)) +
    geom_bar(stat = "identity") +
    labs(
      title = "Serious Mental Illness Number, by Region, 2019-2020",
      x = "Region",
      y = "Mental illness number (million)",
      fill = "Region") +
    theme(legend.position = "bottom")

grid.arrange(any_mental_plot, ser_mental_plot, ncol =2) 

Comment Both any mental illness and serious mental illness are highest in the South, lowest in the northeast.

Any/Serious Mental illness percent, Top 10 states, 2019-2020

any_top10_plot =
  mental_df %>% 
    filter(row_number(desc(any_mental_per)) <= 10) %>% 
    mutate(
      state = fct_reorder(state, any_mental_per)
    ) %>% 
    ggplot(
      aes(x = any_mental_per, y = state, fill = state)) +
      geom_bar(stat = "identity") +
      labs(
        title = "Any Mental Illness Percent, Top 10 States",
        x = "Any Mental illness percent (%)",
        y = "State",
        fill = "State") +
    theme(legend.position = "bottom")

ser_top10_plot =
  mental_df %>% 
    filter(row_number(desc(ser_mental_per)) <= 10) %>% 
    mutate(
      state = fct_reorder(state, ser_mental_per)
    ) %>% 
    ggplot(
      aes(x = ser_mental_per, y = state, fill = state)) +
      geom_bar(stat = "identity") +
      labs(
        title = "Serious Mental Illness Percent, Top 10 States",
        x = "Serious mental illness percent (%)",
        y = "State",
      fill = "State") +
    theme(legend.position = "bottom")

grid.arrange(any_top10_plot, ser_top10_plot, ncol =2) 

Comment

The top 10 states for any and serious mental illness are 8/10 the same, except Washington, Rhode Island, Arkansas and Indiana. Ultah has the highest any/serious mental illness percent.

Anxiety Trend

Percentage of people reported taken medication for worried, nervous, or anxious feelings

According to the plot, from 2015 to 2021, the percentage of people who report taking medication for worry, stress or anxiety is constantly increasing from 9.13% in 2015 to 13.57% in 2021. We can observe a rapid increase from 2017 to 2019 and, contrary to our expectations, a relatively slow increase from 2019 to 2020. The effect of COVID-19 on anxiety is not evident in this plot.

anx_dep %>%
  drop_na(worrx) %>% 
  group_by(year, worrx) %>% 
  summarize(wor_num = n()) %>% 
  pivot_wider(
    names_from = worrx,
    values_from = wor_num
  ) %>% 
  mutate(
    wor_percentage = yes/(no + yes)*100,
    text_label = str_c(yes, " out of ", no + yes)
  ) %>% 
  ungroup() %>% 
  plot_ly(
    y = ~wor_percentage,
    x = ~year,
    color = ~year,
    type = "bar", 
    colors = "viridis",
    text = ~text_label
  ) %>% 
  layout(
    xaxis = list (title = ""),
    yaxis = list (title = "Percentage"),
    showlegend = FALSE
  ) %>% 
  hide_colorbar()

Stratify by Biological sex

Stratify the reported percentage of people taking medication for worried, nervous, or anxious feelings by biological sex, we can observe a much higher percentage among females than males. There is also a faster increase in the percentage among females from 14.41% in 2018 to 16.52% in 2019. This increase may indicate COVID-19-induced anxiety in females. Among males, the percentage is relatively stable from 2018 to 2020, while there is an increase from 2020 to 2021.

anx_dep %>%
  drop_na(sex, worrx) %>% 
  group_by(sex, year, worrx) %>% 
  summarize(wor_num = n()) %>% 
  pivot_wider(
    names_from = worrx,
    values_from = wor_num
  ) %>% 
  mutate(
    wor_percentage = yes/(no + yes)*100,
    text_label = str_c(yes, " out of ", no + yes)
  ) %>% 
  ungroup() %>% 
  plot_ly(
    y = ~wor_percentage,
    x = ~year,
    color = ~sex,
    type = "bar", 
    colors = "viridis",
    text = ~text_label
  ) %>% 
  add_trace(
    x = ~year,
    y = ~wor_percentage,
    color = ~sex,
    type='scatter',
    mode='lines+markers'
  ) %>% 
  layout(
    xaxis = list (title = ""),
    yaxis = list (title = "Percentage"),
    legend = list(orientation = 'h')
  )

Stratify by ratio of family income to poverty threshold

Stratify the percentage of people reported taken medication for worried, nervous, or anxious feelings by the ratio of household income to the poverty line, we can clearly see that the poorer the household, the higher their percentage. The percentage among the poorest stratum decreased rapidly from 17.30% in 2017 to 15.84% in 2018, which is the opposite of what happened in the other two strata. Although the percentage of the poorest strata decreased rapidly from 2017 to 2018, they still had the highest percentage of the three strata, and this decrease was followed by a rapid increase from 15.84% in 2018 to 18.58% in 2019, which may indicate that people belonging to the poorest stratum are more susceptible to anxiety caused by COVID-19. From 2020 to 2021, the percentage decreases for the other two strata, while for the richest strata, the percentage steadily increases.

anx_dep %>%
  drop_na(poverty, worrx) %>% 
  group_by(poverty, year, worrx) %>% 
  summarize(wor_num = n()) %>% 
  pivot_wider(
    names_from = worrx,
    values_from = wor_num
  ) %>% 
  mutate(
    wor_percentage = yes/(no + yes)*100,
    text_label = str_c(yes, " out of ", no + yes)
  ) %>% 
  ungroup() %>% 
  plot_ly(
    y = ~wor_percentage,
    x = ~year,
    color = ~poverty,
    type = "scatter", 
    mode = "lines+markers",
    colors = "viridis",
    text = ~text_label
  ) %>% 
  layout(
    xaxis = list (title = ""),
    yaxis = list (title = "Percentage"),
    legend = list(orientation = 'h')
  )

Stratify by current martial status

Stratify the percentage of people reported taken medication for worried, nervous, or anxious feelings by current martial status, we can observe a rapid increase from 10.61% in 2017 to 15.60% in 2019 among those widowed, while it is difficult to tell whether this increase is caused by COVID-19 as it starts at 2017. There is also a rapid increase among those who are separated, from 14.31% in 2019 to 17.49% in 2020. Considering the timing, this could be an effect of COVID-19. The trends are similar for married and never married.

anx_dep %>%
  drop_na(marst, worrx) %>% 
  group_by(marst, year, worrx) %>% 
  summarize(wor_num = n()) %>% 
  pivot_wider(
    names_from = worrx,
    values_from = wor_num
  ) %>% 
  mutate(
    wor_percentage = yes/(no + yes)*100,
    text_label = str_c(yes, " out of ", no + yes)
  ) %>% 
  ungroup() %>% 
  plot_ly(
    y = ~wor_percentage,
    x = ~year,
    color = ~marst,
    type = "scatter",
    mode='lines+markers',
    colors = "viridis",
    text = ~text_label
  ) %>% 
  layout(
    xaxis = list (title = ""),
    yaxis = list (title = "Percentage"),
    legend = list(orientation = 'h')
  )

Frequency of worried, nervous, or anxious feelings

From this bar plot about how often people feel worried, nervous, or anxious, we can observe that the frequency is steadily increasing from 2015 to 2021. There is also a rapid increase from 2019 to 2020, which could be caused by COVID-19.

anx_dep %>% 
  drop_na(worfreq) %>% 
  group_by(year, worfreq) %>% 
  summarize(count = n()) %>% 
  group_by(year) %>% 
  summarize(
     percentage=100 * count/sum(count),
     sum_count = sum(count),
     worfreq = worfreq,
     count=count
  ) %>% 
  mutate(
    text_label = str_c(count, " out of ", sum_count)
  ) %>% 
  plot_ly(
    y = ~percentage,
    x = ~year,
    color = ~worfreq,
    type = "bar", 
    colors = "viridis",
    text = ~text_label
  ) %>% 
  layout(
    xaxis = list (title = ""),
    yaxis = list (title = "Percentage"), 
    barmode = 'stack',
    legend = list(orientation = 'h')
  )

Level of worried, nervous, or anxious feelings

From this bar plot about the level of worried, nervous, or anxious feelings people felt last time, we can observe a relatively large increase from 2018 to 2019 in the percentage of people who felt worried, stressed, or anxious a lot or between a little and a lot, which could be an effect of COVID-19.

anx_dep %>%
  drop_na(worfeelevl) %>% 
  group_by(year, worfeelevl) %>% 
  summarize(count = n()) %>% 
  group_by(year) %>% 
  summarize(
     percentage=100 * count/sum(count),
     sum_count = sum(count),
     worfeelevl = worfeelevl,
     count=count
  ) %>% 
  mutate(
    text_label = str_c(count, " out of ", sum_count)
  ) %>% 
  plot_ly(
    y = ~percentage,
    x = ~year,
    color = ~worfeelevl,
    type = "bar", 
    colors = "viridis",
    text = ~text_label
  ) %>% 
  layout(
    xaxis = list (title = ""),
    yaxis = list (title = "Percentage"), 
    barmode = 'stack',
    legend = list(orientation = 'h')
  )

Depression Trend

Percentage of people reported taken medication for depression

According to the plot, the proportion of people reported taken medication for depression increased from 8.75% in 2015 to 11.42% in 2020, followed by a slight decrease from 2020 to 2021. COVID-19 appears to have a limited impact on depression.

anx_dep %>%
  drop_na(deprx) %>% 
  group_by(year, deprx) %>% 
  summarize(dep_num = n()) %>% 
  pivot_wider(
    names_from = deprx,
    values_from = dep_num
  ) %>% 
  mutate(
    dep_percentage = yes/(no + yes)*100,
    text_label = str_c(yes, " out of ", no + yes)
  ) %>% 
  ungroup() %>% 
  plot_ly(
    y = ~dep_percentage,
    x = ~year,
    color = ~year,
    type = "bar", 
    colors = "viridis",
    text = ~text_label
  ) %>% 
  layout(
    xaxis = list (title = ""),
    yaxis = list (title = "Percentage"),
    showlegend = FALSE
  ) %>% 
  hide_colorbar()

Stratify by Biological sex

Stratify the reported percentage of people taking medication for depression by biological sex, we can observe a much higher percentage among females than males. There are also a faster increase in the percentage among females from 12.68% in 2017 to 15.14% in 2020 and a decrease from 15.14% in 2020 to 14.52% in 2021. The decrease may indicate that COVID-19 can be a protective factor against depression among females. Contrary to females, the percentage slightly decreased from 2018 to 2019 and then increased from 2020 to 2021 among males.

anx_dep %>%
  drop_na(sex, deprx) %>% 
  group_by(sex, year, deprx) %>% 
  summarize(dep_num = n()) %>% 
  pivot_wider(
    names_from = deprx,
    values_from = dep_num
  ) %>% 
  mutate(
    dep_percentage = yes/(no + yes)*100,
    text_label = str_c(yes, " out of ", no + yes)
  ) %>% 
  ungroup() %>% 
  plot_ly(
    y = ~dep_percentage,
    x = ~year,
    color = ~sex,
    type = "bar", 
    colors = "viridis",
    text = ~text_label
  ) %>% 
  add_trace(
    x = ~year,
    y = ~dep_percentage,
    color = ~sex,
    type='scatter',
    mode='lines+markers'
  ) %>% 
  layout(
    xaxis = list (title = ""),
    yaxis = list (title = "Percentage"),
    legend = list(orientation = 'h')
  )

Stratify by ratio of family income to poverty threshold

Stratify the percentage of people reported taken medication for depression by the ratio of household income to the poverty line, we can clearly see that the poorer the household, the higher their percentage. The percentage among the poorest stratum decreased from 17.41% in 2017 to 16.53% in 2018, which is the opposite of what happened in the other two strata. The change in the percentage is quite stable from 2018 to 2019 among all three strata. There is a rapid increase from 17.02% in 2019 to 18.66% in 2020, which may indicate that people belonging to the poorest stratum are affected by COVID-19 related depression. From 2020 to 2021, the percentage decreases for the other two strata, while for the richest strata, the percentage increases.

anx_dep %>%
  drop_na(poverty, deprx) %>% 
  group_by(poverty, year, deprx) %>% 
  summarize(dep_num = n()) %>% 
  pivot_wider(
    names_from = deprx,
    values_from = dep_num
  ) %>% 
  mutate(
    dep_percentage = yes/(no + yes)*100,
    text_label = str_c(yes, " out of ", no + yes)
  ) %>% 
  ungroup() %>% 
  plot_ly(
    y = ~dep_percentage,
    x = ~year,
    color = ~poverty,
    type = "scatter", 
    mode = "lines+markers",
    colors = "viridis",
    text = ~text_label
  ) %>% 
  layout(
    xaxis = list (title = ""),
    yaxis = list (title = "Percentage"),
    legend = list(orientation = 'h')
  )

Stratify by current martial status

Stratify the percentage of people reported taken medication for depression by current martial status, we can observe a rapid decrease from 17.26% in 2016 to 13.12% in 2019 among separated, while this downward trend slows from 2018 to 2019 and reverses from 2019 to 2020. The trends are similar for married and never married, divorced and widowed. the effect of COVID-19 is not evident in this plot.

anx_dep %>%
  drop_na(marst, deprx) %>% 
  group_by(marst, year, deprx) %>% 
  summarize(dep_num = n()) %>% 
  pivot_wider(
    names_from = deprx,
    values_from = dep_num
  ) %>% 
  mutate(
    dep_percentage = yes/(no + yes)*100,
    text_label = str_c(yes, " out of ", no + yes)
  ) %>% 
  ungroup() %>% 
  plot_ly(
    y = ~dep_percentage,
    x = ~year,
    color = ~marst,
    type = "scatter",
    mode='lines+markers',
    colors = "viridis",
    text = ~text_label
  ) %>% 
  layout(
    xaxis = list (title = ""),
    yaxis = list (title = "Percentage"),
    legend = list(orientation = 'h')
  )

Frequency of depression

From this bar plot about how often people feel depressed, we can observe that the frequency is quite stable with a decrease from 2018 to 2019 and am increase from 2019 to 2021. There is no clear evidence of an effect of COVID-19 on the frequency of depression.

anx_dep %>% 
  drop_na(depfreq) %>% 
  group_by(year, depfreq) %>% 
  summarize(count = n()) %>% 
  group_by(year) %>% 
  summarize(
     percentage=100 * count/sum(count),
     sum_count = sum(count),
     depfreq = depfreq,
     count=count
  ) %>% 
  mutate(
    text_label = str_c(count, " out of ", sum_count)
  ) %>% 
  plot_ly(
    y = ~percentage,
    x = ~year,
    color = ~depfreq,
    type = "bar", 
    colors = "viridis",
    text = ~text_label
  ) %>% 
  layout(
    xaxis = list (title = ""),
    yaxis = list (title = "Percentage"), 
    barmode = 'stack',
    legend = list(orientation = 'h')
  )

Level of depression

From this bar plot about the level of depression last time, we can see that the percentage of people who felt a lot or between a little and a lot depression is stable over the time period and a decrease of percentage of people feel a lot depression from 2018 to 2019. There is also no clear evidence of an effect of COVID-19 on the level of depression.

anx_dep %>%
  drop_na(depfeelevl) %>% 
  group_by(year, depfeelevl) %>% 
  summarize(count = n()) %>% 
  group_by(year) %>% 
  summarize(
     percentage=100 * count/sum(count),
     sum_count = sum(count),
     depfeelevl = depfeelevl,
     count=count
  ) %>% 
  mutate(
    text_label = str_c(count, " out of ", sum_count)
  ) %>% 
  plot_ly(
    y = ~percentage,
    x = ~year,
    color = ~depfeelevl,
    type = "bar", 
    colors = "viridis",
    text = ~text_label
  ) %>% 
  layout(
    xaxis = list (title = ""),
    yaxis = list (title = "Percentage"), 
    barmode = 'stack',
    legend = list(orientation = 'h')
  )

Regression

data cleaning

nhis = 
  read_csv("data/nhis_data01.csv") %>% 
  janitor::clean_names() %>% 
  mutate(
    worrx = recode_factor(worrx,'1' = "no", '2' = "yes"),
    worfreq = recode_factor(worfreq, '1' = "Daily", '2' = "Weekly", 
                            '3' = "Monthly", '4' = "A few times a year", 
                            '5' = "Never"),
    worfeelevl = recode_factor(worfeelevl, '1' = "A lot", 
                               '3' = "Somewhere between a little and a lot", 
                               '2' = "A little"),
    deprx = recode_factor(deprx, '1' = "no", '2' = "yes"),
    depfreq = recode_factor(depfreq, '1' = "Daily", '2' = "Weekly", 
                            '3' = "Monthly", '4' = "A few times a year", 
                            '5' = "Never"),
    depfeelevl = recode_factor(depfeelevl, '1' = "A lot", 
                               '3' = "Somewhere between a little and a lot", 
                               '2' = "A little")
  )

recent_yr_df = 
  nhis %>% 
  filter(year>2019)

for plots and analysis, may be we can refe to this website: https://rpubs.com/benhorvath/logistic_regression

Worries and anxiety

logistic regression

whether taken medication for worried, nervous, or anxious feelings is associated with covid-19 adjusting for sex and family income level.

glm(worrx ~ sex + poverty + cvddiag, 
                            family=binomial(link='logit'),
                            data = recent_yr_df) %>% 
    broom::tidy(conf.int = TRUE) %>% 
    mutate(
      odds_ratio = exp(estimate),
      low = exp(conf.low),
      high = exp(conf.high)
    ) %>% 
    select(term, odds_ratio, low, high) %>% 
  knitr::kable(digits=2)
term odds_ratio low high
(Intercept) 0.08 0.07 0.09
sex 2.09 1.98 2.20
poverty 0.98 0.98 0.99
cvddiag 1.07 1.03 1.12

depression

whether taken medication for depression is associated with covid-19 adjusting for sex and family income level.

glm(deprx ~ sex + poverty + cvddiag, 
                            family=binomial(link='logit'),
                            data = recent_yr_df) %>% 
    broom::tidy(conf.int = TRUE) %>% 
    mutate(
      odds_ratio = exp(estimate),
      low = exp(conf.low),
      high = exp(conf.high)
    ) %>% 
    select(term, odds_ratio, low, high) %>% 
  knitr::kable(digits=2)
term odds_ratio low high
(Intercept) 0.08 0.07 0.09
sex 2.08 1.97 2.20
poverty 0.97 0.97 0.98
cvddiag 1.05 1.00 1.10

Suicide Analysis

Impoart and tidy dataset

suicide_df = 
  read_excel(
    "./data/suicide_data.xlsx",
    sheet = 1,
    col_names = TRUE) %>% 
  janitor::clean_names() %>% 
  mutate(
    population = (suicide_no / suicide_100k) * 100000, 
    sex = as.factor(sex),
    age = as.factor(age)
  )

average_20years = sum(suicide_df$suicide_no) / sum(suicide_df$population) * 100000

suicide_state_df = 
  read_excel(
    "./data/suicide_data.xlsx",
    sheet = 2,
    range = "A1:D351",
    skip = 1,
    col_names = TRUE) %>% 
  janitor::clean_names() %>% 
  rename(
     suicide_no = deaths,
     suicide_100k = death_rate) %>% 
  mutate(
    population = (suicide_no / suicide_100k) * 100000
  )

National trend of suicide rate, 2000-2020, (per 100K, per year)

suicide_plot = suicide_df %>% 
  group_by(year) %>% 
  summarize(
    population = sum(population),
    suicide = sum(suicide_no),
    suicide_100k = (suicide / population) * 100000
  ) %>%
  ggplot(aes(x = year, y = suicide_100k)) +
  geom_line(col = "deepskyblue", size = 1) +
  geom_point(col = "deepskyblue", size = 2) +
  geom_hline(
    yintercept = average_20years, linetype = 2, color = "red", size = 1) +
  scale_x_continuous(breaks = seq(2000, 2020, 5)) + 
  scale_y_continuous(breaks = seq(8, 18, 1)) +
  labs(title = "US National Suicide Rate (per 100K), 2020-2022",
       x = "Year", 
       y = "Suicides per 100k") 

ggplotly(suicide_plot)

Comment: The US national suicide rates increased from 2000 to 2018, then declined from 2018 to 2020. The average suicide rate from 2000 to 2020 was 14.2 per 100,000 (represented with red dot line).

suicide rate for each state, 2014-2020

suicide_state_df %>% 
  group_by(state) %>% 
  summarize(
    population = sum(population),
    suicide = sum(suicide_no),
    suicide_100k = (suicide / population) * 100000
  ) %>%
  mutate(
    state = fct_reorder(state, suicide_100k)) %>% 
  ggplot(aes(x = suicide_100k, y = state, fill = state )) +
  geom_bar(stat = "identity") +
  scale_x_continuous(breaks = seq(0, 30, 2)) +
  labs(
    title = "Suicide Rate, by State, 2014-2020", 
    x = "Suicide rate per 100k", 
    y = "State") +
  theme(legend.position = "right")

Comment: Wyoming had the highest suicide rate and New Jersey had the lowest sucide rate from 2014-2020. The top 5 states with high suicide rates were WY, AK,

Suicide rate by sex

total_sex_plot = suicide_df %>% 
  group_by(sex) %>% 
  summarize(
    population = sum(population),
    suicide = sum(suicide_no),
    suicide_100k = (suicide / population) * 100000
  ) %>% 
  ggplot(aes(x = sex, y = suicide_100k, fill = sex )) +
  geom_bar(stat = "identity") +
  scale_y_continuous(breaks = seq(0, 24, 4)) +
  labs(
    title = "National Suicide Rate, by Sex", 
    x = "Sex", 
    y = "Suicides per 100k")


year_sex_plot = suicide_df %>% 
  group_by(year, sex) %>% 
  summarize(
    population = sum(population),
    suicide = sum(suicide_no),
    suicide_100k = (suicide / population) * 100000
  ) %>% 
  ggplot(aes(x = year, y = suicide_100k, color = sex)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  scale_y_continuous(breaks = seq(0, 30, 5)) +
  labs(
    title = "Suicide Trend Over Years, by Sex",
    x = "Year",
    y = "Suicide per 100k"
  )

grid.arrange(total_sex_plot, year_sex_plot, ncol = 2 )

Comment: Nationally, the overall suicide rate for males is about 4 times that of females. For females, suicide rates peaked in 2018 and declined since then; for males, suicide rates peaked in 2017 and declined since then. From year 2000 to 2020, the male suicide rate remained apparently higher than the female suicide rate, and the ratio was constantly about 4:1.

Suicide rate by age

total_age_plot = suicide_df %>% 
  group_by(age) %>% 
  summarize(
    population = sum(population),
    suicide = sum(suicide_no),
    suicide_100k = (suicide / population) * 100000
  ) %>% 
  ggplot(aes(x = age, y = suicide_100k, fill = age )) +
  geom_bar(stat = "identity") +
  scale_y_continuous(breaks = seq(0, 20, 2)) +
  labs(
    title = "National Suicide Rate, by Age", 
    x = "Age", 
    y = "Suicides per 100k")


year_age_plot = suicide_df %>% 
  group_by(year, age) %>% 
  summarize(
    population = sum(population),
    suicide = sum(suicide_no),
    suicide_100k = (suicide / population) * 100000
  ) %>% 
  ggplot(aes(x = year, y = suicide_100k,  color = age)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  scale_y_continuous(breaks = seq(0, 24, 2)) +
  labs(
    title = "Suicide Trend Over Years, by Age",
    x = "Year",
    y = "Suicide per 100k"
  )

grid.arrange(total_age_plot, year_age_plot, ncol = 2 )

Comment: Nationally, aged 45-64 had the highest suicide rate, second highest group was aged 75+. The 10-14 aged group had the lowest suicide rate. From year 2000 to 2020, the suicide rate of group aged 10-14 remained roughly static and small. Suicide rates in all other age groups showed an overall upward trend. Among them, the group aged 25-44 had the largest change, roughly from 10 to 18 suicide rate per 100k. The suicide rates of those aged 45-64 and aged 65-74 started to drop since 2018.